Gently Introduce to Variable Selection using Bayesian Approach
This example is based on section 18.4 in Kruschke, 2015.
A Bayesian approach variable selection in regression analysis can be
done by adding a binary parameter for each slope:
\[y_i = \beta_0 + \sum_j \delta_j \ \beta_j
x_{i,j},\]
with \[\delta_j = 0, \ 1.\]
Every combination of values \(\delta_1, \ ..., \delta_k\) gives a submodel. Simple priors for delta-indicators are independent Bernoulli distributions: \(\delta \sim \text{dbern}(0.5)\). Then posterior probabilities \(P(\delta_j = 1\mid D)\) show importance of the corresponding predictors \(x_i\).
The diagram of such model is an easy generalization of regression
model.
Since parameters \(\delta_j\) are
discrete, we need to use JAGS: library(runjags) instead of
Stan
library(runjags)The SAT data includes:
State: 50 states,SATV: SAT verbal; SATM: SAT math;
SATT: SAT total scoreSpend: average spending per pupilPrcntTake: percentage of students who took the
testStuTeaRat: the average student teacher ratio in each
state andSalary: the average salary of the teachersThese latter four variables are also plausible predictors of SAT score.
dta <- read.csv("data/Guber1999data.csv")
names(dta)[1] "State"     "Spend"     "StuTeaRat" "Salary"    "PrcntTake"
[6] "SATV"      "SATM"      "SATT"     DT::datatable(dta)We now let look at the correlation matrix among 4 predictors.
dta %>%
  select(Spend, PrcntTake, StuTeaRat, Salary) %>%
  cor(.) %>%
  round(., 3)           Spend PrcntTake StuTeaRat Salary
Spend      1.000     0.593    -0.371  0.870
PrcntTake  0.593     1.000    -0.213  0.617
StuTeaRat -0.371    -0.213     1.000 -0.001
Salary     0.870     0.617    -0.001  1.000That Salary is strongly correlated with Spend, and therefore, a model that includes both Salary and Spend will show a strong trade-off between those predictors and consequently will show inflated uncertainty in the regression coefficients for either one. Should only one or the other be included, or both, or neither?
One more point, response variable is average high school SAT score per state. One of the predictors is amount of money spent by the state on schools. Plotting amount of money spent against SAT scores shows negative slope.
plot(dta$Spend,dta$SATT)Thus, should we cut funding for school? Because even with the funding, SAT would be reduced disproportional with the budget spent!!
y <- dta[,'SATT']
x <- as.matrix(dta[ , c("Spend","PrcntTake","StuTeaRat","Salary")])
#prepare data in JAGS
dataList <- list(Ntotal = length(y),
                 y = y,
                 x = x,
                 Nx = ncol(x))Description of the model:
\[y_i = \beta_0 + \delta_j \ \beta_j x_{i,j} + \epsilon_i \\ y_i \sim t(\mu_i, \sigma, \nu) \\ \\ \text{where mean, scale and degree of freedom, respectively, would be} \\ \mu_i = \beta_0 + \delta_j \ \beta_j x_{i,j} \\ \beta_0 \sim N(0, \frac{1}{1/2^2}) \ ; \ \beta_j \sim t(0, 1/\sigma_{\beta}^2, \nu = 1) \\ \sigma \sim Unif(L, H) \\ \nu = \gamma^{\prime} + 1, \ \text{where} \ \gamma^{\prime} \sim exp(\lambda) \\ \delta_j \sim Bernoulli(0.5) \ ; \ \sigma_{\beta} \sim \text{Gamma(shape, rate)}\]
modelString = "
    # Standardize the data:
    data {
        ym <- mean(y)
        ysd <- sd(y)
        for (i in 1:Ntotal) {
            zy[i] <- (y[i] - ym) / ysd
        }
        for (j in 1:Nx) {
            xm[j]  <- mean(x[,j])
            xsd[j] <-   sd(x[,j])
            for (i in 1:Ntotal) {
                zx[i,j] <- (x[i,j] - xm[j]) / xsd[j]
            }
        }
    }
    # Specify the model for standardized data:
    model {
        for (i in 1:Ntotal) {
            zy[i] ~ dt(zbeta0 + sum(delta[1:Nx] * zbeta[1:Nx] * zx[i,1:Nx] ) ,  1/zsigma^2 , nu)
        }
        # Priors vague on standardized scale:
        zbeta0 ~ dnorm(0, 1/2^2)
        for (j in 1:Nx) {
            zbeta[j] ~ dt(0 , 1/sigmaBeta^2 , 1) 
            delta[j] ~ dbern( 0.5 )
        }
        zsigma ~ dunif(1.0E-5 , 1.0E+1)
        
        # sigmaBeta <- 2.0 ## uncomment one of the following specifications for sigmaBeta
        # sigmaBeta ~ dunif( 1.0E-5 , 1.0E+2 )
        sigmaBeta ~ dgamma(1.1051,0.1051) # mode 1.0, sd 10.0
        # sigmaBeta <- 1/sqrt(tauBeta) ; tauBeta ~ dgamma(0.001,0.001) 
        nu ~ dexp(1/30.0)
        
        # Transform to original scale:
        beta[1:Nx] <- (delta[1:Nx] * zbeta[1:Nx] / xsd[1:Nx])*ysd
        beta0 <- zbeta0*ysd  + ym - sum(delta[1:Nx] * zbeta[1:Nx] * xm[1:Nx] / xsd[1:Nx])*ysd
        sigma <- zsigma*ysd
    }
"parameters <- c("beta0",  "beta",  "sigma", "delta", "sigmaBeta", "zbeta0", "zbeta", "zsigma", "nu" )
runjagsMethod <- "parallel"  # change to "rjags" in case of working on 1-core CPU 
# run JAGSrunJagsOut <- run.jags(method = runjagsMethod,
                       model = modelString, 
                       monitor = parameters, 
                       data = dataList,
                       n.chains = 3, #nChains <- 3
                       adapt = 500, #adaptSteps <- 500
                       burnin = 1000, #burnInSteps <- 1000
                       sample = ceiling(15000/3), #numSavedSteps <- 15000;  nChains <- 3
                       thin = 25, #thinSteps <- 25
                       summarise = FALSE,
                       plots = FALSE)# generate summary info for all parameters
summary(runJagsOut)Calculating summary statistics...
Note: The monitored variable 'delta[2]' appears to be
non-stochastic; it will not be included in the convergence
diagnostic
Calculating the Gelman-Rubin statistic for 18 variables....
Error : The "modeest" package is required to calculate the mode of continuous variables               Lower95        Median      Upper95          Mean
beta0     944.66400000  1.011275e+03  1.11231e+03  1.018969e+03
beta[1]    -0.00128211  7.803640e+00  1.91766e+01  7.314723e+00
beta[2]    -3.26471000 -2.761835e+00 -2.22909e+00 -2.753613e+00
beta[3]    -5.57611000  0.000000e+00  4.47374e-03 -6.349370e-01
beta[4]    -0.26275600  0.000000e+00  3.55510e+00  3.305153e-01
sigma      24.62630000  3.214530e+01  4.10201e+01  3.238811e+01
delta[1]    0.00000000  1.000000e+00  1.00000e+00  6.138000e-01
delta[2]    1.00000000  1.000000e+00  1.00000e+00  1.000000e+00
delta[3]    0.00000000  0.000000e+00  1.00000e+00  1.732667e-01
delta[4]    0.00000000  0.000000e+00  1.00000e+00  2.208667e-01
sigmaBeta   0.02406400  1.127925e+00  7.94953e+00  2.198454e+00
zbeta0     -0.12437500 -1.292225e-04  1.29719e-01  1.964852e-04
zbeta[1]   -7.77130000  2.129285e-01  1.10354e+01  5.264458e-01
zbeta[2]   -1.16775000 -9.878755e-01 -7.97319e-01 -9.849344e-01
zbeta[3]  -24.01480000 -8.063585e-02  2.36377e+01 -2.338195e-01
zbeta[4]  -17.73010000  1.036410e-01  1.87959e+01  3.378339e-01
zsigma      0.32913900  4.296320e-01  5.48246e-01  4.328772e-01
nu          1.91471000  2.685290e+01  9.61985e+01  3.558561e+01
                   SD Mode        MCerr MC%ofSD SSeff       AC.250
beta0     43.89744488   NA 0.4821033688     1.1  8291  0.019090161
beta[1]    7.04248924   NA 0.1017558219     1.4  4790  0.064596992
beta[2]    0.27026073   NA 0.0033981576     1.3  6325  0.053310641
beta[3]    1.74666227   NA 0.0151369398     0.9 13315 -0.009957991
beta[4]    0.99368906   NA 0.0093047567     0.9 11405  0.010317963
sigma      4.16357566   NA 0.0387315148     0.9 11556  0.016315434
delta[1]   0.48689359    1 0.0078742141     1.6  3823  0.085242124
delta[2]   0.00000000    1           NA      NA    NA           NA
delta[3]   0.37849026    0 0.0034208127     0.9 12242 -0.004375483
delta[4]   0.41484462    0 0.0038539951     0.9 11586  0.010031033
sigmaBeta  3.24829579   NA 0.0687169662     2.1  2235  0.115133738
zbeta0     0.06462281   NA 0.0005367156     0.8 14497 -0.007738634
zbeta[1]   5.53722794   NA 0.2424652404     4.4   522  0.502781355
zbeta[2]   0.09666902   NA 0.0012154802     1.3  6325  0.053310325
zbeta[3]  18.13696053   NA 0.5572712553     3.1  1059  0.319423413
zbeta[4]  14.61246772   NA 0.3817438159     2.6  1465  0.293091432
zsigma     0.05564748   NA 0.0005176588     0.9 11556  0.016315483
nu        30.03665106   NA 0.2442544957     0.8 15122 -0.003220931
               psrf
beta0     1.0000092
beta[1]   1.0001584
beta[2]   1.0002489
beta[3]   1.0006953
beta[4]   1.0006186
sigma     1.0000140
delta[1]  1.0006266
delta[2]         NA
delta[3]  1.0008028
delta[4]  1.0006483
sigmaBeta 1.0016839
zbeta0    1.0001875
zbeta[1]  1.0614080
zbeta[2]  1.0002489
zbeta[3]  1.0456877
zbeta[4]  1.0327248
zsigma    1.0000147
nu        0.9999383# plot all params
plot(runJagsOut,
     plot.type = c("trace", "ecdf", "histogram", "autocorr"))Generating summary statistics and plots (these will NOT be
saved for reuse)...
Calculating summary statistics...
Note: The monitored variable 'delta[2]' appears to be
non-stochastic; it will not be included in the convergence
diagnostic
Calculating the Gelman-Rubin statistic for 18 variables....
Error : The "modeest" package is required to calculate the mode of continuous variablesFind posterior probability of different combinations of predictors by calculating frequencies with which they appeared in MCMC.
trajectoriesDelta <- as.matrix(runJagsOut$mcmc[,7:10])
head(trajectoriesDelta)     delta[1] delta[2] delta[3] delta[4]
[1,]        0        1        0        0
[2,]        1        1        0        0
[3,]        1        1        0        0
[4,]        1        1        0        0
[5,]        1        1        1        0
[6,]        0        1        0        0Nchain <- nrow(trajectoriesDelta)(config1 <- sum(apply(trajectoriesDelta, 1, function(z) prod(z==c(1,1,0,0))))/Nchain)[1] 0.4738667(config2 <- sum(apply(trajectoriesDelta, 1,function(z) prod(z==c(1,0,0,0))))/Nchain)[1] 0(config3 <- sum(apply(trajectoriesDelta, 1, function(z) prod(z==c(0,1,0,0))))/Nchain)[1] 0.2154667(config4 <- sum(apply(trajectoriesDelta, 1, function(z) prod(z==c(1,1,1,1))))/Nchain)[1] 0.02186667(inclSpend<-sum(trajectoriesDelta[,1]==1)/Nchain) #Spend[1] 0.6138(inclPrcntTake<-sum(trajectoriesDelta[,2]==1)/Nchain) #PrcntTake[1] 1(inclStueTeaRat<-sum(trajectoriesDelta[,3]==1)/Nchain) #StueTeaRat[1] 0.1732667(inclSalary<-sum(trajectoriesDelta[,4]==1)/Nchain) #Salary[1] 0.2208667Spend (0.6138) and
PrcntTake (1).PrcntTake is included with
non-stochastic probability of 1, standard deviation of \(\delta_2\) is zero.Warning. Posterior probabilities of inclusion may be very sensitive to prior information.
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/hai-mn/hai-mn.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Nguyen (2022, Feb. 14). HaiBiostat: Series 5 of 10 -- Introduction to Selection of Variables using Bayesian Approach. Retrieved from https://hai-mn.github.io/posts/2022-02-14-Bayesian methods - Series 5 of 10/
BibTeX citation
@misc{nguyen2022series,
  author = {Nguyen, Hai},
  title = {HaiBiostat: Series 5 of 10 -- Introduction to Selection of Variables using Bayesian Approach},
  url = {https://hai-mn.github.io/posts/2022-02-14-Bayesian methods - Series 5 of 10/},
  year = {2022}
}